Breaking Traditions! FUDOKI Model Makes Multi-Modal Generation and Understanding More Flexible and Efficient
In recent years, the field of artificial intelligence has undergone tremendous changes, especially with large language models (LLMs) making remarkable progress in multi-modal tasks. These models have demonstrated strong potential in their ability to understand and generate language, but most current multi-modal models still adopt auto-regressive (AR) architectures, which limit inference processes to be rather monotonous and lacking flexibility. To address this limitation, a research team from The University of Hong Kong and Huawei Noah's Ark Lab has proposed a brand new model – FUDOKI, aiming to break these constraints. The core innovation of FUDOKI is